Search CORE

7 research outputs found

A Multi-disciplinary Approach to Interactive Information Retrieval upon Semi-structured Data Sets

Author: Boscarino C. (Corrado)
Publication venue: The Chartered Institute for IT
Publication date: 01/09/2009
Field of study

The so called logic and probabilistic views on IR can be reconciled by a unifying framework for IIR. I present a proposal for a PhD research according to a multidisciplinary perspective and I discuss some of its consequences for IR as a discipline

Crossref

CWI's Institutional Repository

Prior Information and the Determination of Event Spaces in Probabilistic Information Retrieval Models

Author: Boscarino C. (Corrado)
Vries A.P. (Arjen) de
Publication venue: Springer Berlin / Heidelberg
Publication date: 01/09/2009
Field of study

A mismatch between different event spaces has been used to argue against rank equivalence of classic probabilistic models of information retrieval and language models. We question the effectiveness of this strategy and we argue that a convincing solution should be sought in a correct procedure to design adequate priors for probabilistic reasoning. Acknowledging our solution of the event space issue invites to rethink the relation between probabilistic models, statistics and logic in the context of IR

CWI's Institutional Repository

Search for journalists: New York Times challenge report

Author: Alink W. (Wouter)
Boscarino C. (Corrado)
Vries A.P. (Arjen) de
Publication venue: MSFT research
Publication date: 01/08/2010
Field of study

We investigate how a user-centred design to search can improve the support of user tasks specific to journalism. Illustrated by example information needs, sampled from our own exploration of the New York Times annotated corpus, we demonstrate how domain specific notions rooted in a field theory of journalism can be transformed into effective search strategies. We present a method for search-context aware classification of authorities, witnesses, reporters and columnists. A first search strategy supports the journalistic task of investigating the trustworthiness of a news source, whereas the second search strategy supports assessments of the objectivity of an author. In principle, these strategies can exploit the semantic annotations the corpus; however, based on our preliminary work with the corpus, we conclude that straightforward full-text search is still a crucial component of any effective search strategy, as only recent articles are annotated, and annotations are far from complete

CWI's Institutional Repository

Implicit relevance feedback from a multi-step search process: a use of query-logs

Author: Boscarino C. (Corrado)
Hollink V. (Vera)
Ossenbruggen J.R. (Jacco) van
Vries A.P. (Arjen) de
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/03/2011
Field of study

We evaluate the use of clickthrough information as implicit relevance feedback in sessions. We employ records of user interactions with a search system for pictures retrieval: issued queries, clicked images, and purchased content; we investigate whether and how much of the past search history should be used in a feedback loop. We also assess the benefit of using clicked data as positive tokens of relevance to the task of estimating the probability of an image to be purchased

CWI's Institutional Repository

CWI at TREC 2012, KBA track and Session Track

Author: Araújo S. (Samur)
Boscarino C. (Corrado)
Gebremeskel G.G. (Gebre)
He J. (Jiyin)
Vries A.P. (Arjen) de
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/02/2013
Field of study

We participated in two tracks: Knowledge Base Acceleration (KBA) Track and Session Track. In the KBA track, we focused on experi- menting with different approaches as it is the first time the track is launched. We experimented with supervised and unsupervised re- trieval models. Our supervised approach models include language models and a string-learning system. Our unsupervised approaches include using: 1)DBpedia labels and 2) Google-Cross-Lingual Dic- tionary (GCLD). While the approach that uses GCLD targets the central and relvant bins, all the rest target the central bin. The GCLD and the string-learning system have outperformed the oth- ers in their respective targeted bins. The goal of the Session track submission is to evaluate whether and how a logic framework for representing user interactions with an IR system can be used for improving the approximation of the relevant term distribution that another system that is supposed to have access to the session infor- mation will then calculate. the documents in the stream corpora. Three out of the seven runs used a Hadoop cluster provide by Sara.nl to process the stream cor- pora. The other 4 runs used a federated access to the same corpora distributed among 7 workstations

CWI's Institutional Repository

CWI at TREC 2011: Session, Web, and Medical

Author: Boscarino C. (Corrado)
Cornacchia R. (Roberto)
He J. (Jiyin)
Hollink V. (Vera)
Vries A.P. (Arjen) de
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/01/2011
Field of study

CWI's Institutional Repository

Adapting Query Expansion to Search Proficiency

Author: Boscarino C. (Corrado)
Hollink V. (Vera)
Vries A.P. (Arjen) de
Publication venue: ECIR LNCS
Publication date: 01/04/2012
Field of study

We argue that query expansion (QE) based on the full ses- sion improves the overall search experience provided that we know how to adapt the QE weighting schema to a user's search proficiency. We propose a strategy to predict search ability from session parameters. Us- ing an exponential model and these metrics we set user dependent QE coefficients. We evaluate this approach on TREC 2011 session track data

CWI's Institutional Repository